Client-Server Architectures and Methods for Zoomable User Interfaces

ABSTRACT

Exemplary embodiments of the present invention provide methods and systems for communicating and processing data in communication networks, e.g., cable networks and/or interactive television networks. Selective use of different data streams and encoding techniques enable sophisticated user interfaces to be generated on client devices having varying processing capabilities. MPEG encoding techniques have reduced complexity to enable better response time to user requests. Specialized user interface features, such as hoverzooming, are enabled.

RELATED APPLICATION

This application is related to, and claims priority from, U.S. patentapplication Ser. No. 11/144,880, filed Jun. 3, 2005, which claimspriority to U.S. Provisional Patent Application Ser. No. 60/576,786,filed on Jun. 3, 2004, entitled “ZUI on PVR Architecture Specification”,the disclosure of which is incorporated here by reference.

BACKGROUND

The present invention describes systems and methods for processing andtransferring multimedia data between nodes in a communication system,e.g., an interactive television system, usable to create, for example,sophisticated entertainment user interfaces in the home.

Technologies associated with the communication of information haveevolved rapidly over the last several decades. Television, cellulartelephony, the Internet and optical communication techniques (to namejust a few things) combine to inundate consumers with availableinformation and entertainment options. Taking television as an example,the last three decades have seen the introduction of cable televisionservice, satellite television service, pay-per-view movies andvideo-on-demand. Whereas television viewers of the 1960s could typicallyreceive perhaps four or five over-the-air TV channels on theirtelevision sets, today's TV watchers have the opportunity to select fromhundreds and potentially thousands of channels of shows and information.Video-on-demand technology, currently used primarily in hotels and thelike, provides the potential for in-home entertainment selection fromamong thousands of movie titles. Digital video recording (DVR) equipmentsuch as offered by TiVo, Inc., 2160 Gold Street, Alviso, Calif. 95002,further expand the available choices.

The technological ability to provide so much information and content toend users provides both opportunities and challenges to system designersand service providers. One challenge is that while end users typicallyprefer having more choices rather than fewer, this preference iscounterweighted by their desire that the selection process be both fastand simple. Unfortunately, the development of the systems and interfacesby which end users access media items has resulted in selectionprocesses which are neither fast nor simple. Consider again the exampleof television programs. When television was in its infancy, determiningwhich program to watch was a relatively simple process primarily due tothe small number of choices. One would consult a printed guide which wasformatted, for example, as series of columns and rows which showed thecorrespondence between (1) nearby television channels, (2) programsbeing transmitted on those channels and (3) date and time. Thetelevision was tuned to the desired channel by adjusting a tuner knoband the viewer watched the selected program. Later, remote controldevices were introduced that permitted viewers to tune the televisionfrom a distance. This addition to the user-television interface createdthe phenomenon known as “channel surfing” whereby a viewer could rapidlyview short segments being broadcast on a number of channels to quicklylearn what programs were available at any given time.

Despite the fact that the number of channels and amount of viewablecontent has dramatically increased, the generally available userinterface and control device options and frameworks for televisions havenot changed much over the last 30 years. Printed guides are still themost prevalent mechanism for conveying programming information. Themultiple button remote control with simple up and down arrows is stillthe most prevalent channel/content selection mechanism. The reaction ofthose who design and implement the TV user interface to the increase inavailable media content has been a straightforward extension of theexisting selection procedures and interface objects. Thus, the number ofrows and columns in the printed guides has been increased to accommodatemore channels. The number of buttons on the remote control devices hasbeen increased to support additional functionality and content handling.However, this approach has significantly increased both the timerequired for a viewer to review the available information and thecomplexity of actions required to implement a selection. Arguably, thecumbersome nature of the existing interface has hampered commercialimplementation of some services, e.g., video-on-demand, since consumersare resistant to new services that will add complexity to an interfacethat they view as already too slow and complex.

An exemplary control framework having a zoomable graphical userinterface for organizing, selecting and launching media items isdescribed in U.S. patent application Ser. No. 10/768,432, filed on Jan.30, 2004 to Frank A. Hunleth, the disclosure of which is incorporatedhere by reference. This framework provides exemplary solutions to theafore-described problems of conventional interfaces. Among other things,such exemplary frameworks provide mechanisms which display metadataassociated with media items available for selection by a user in amanner which is easy-to-use, but allows a large number of differentmedia items to be accessible. One feature of exemplary frameworksdescribed in this patent application is the use of zooming to provide,among other things, visually informative transitions between differentsemantic levels of media objects displayed by the interface and as amechanism for highlighting objects currently being considered by a user.

The implementation of these types of advanced user interfaces iscomplicated by the system architectures and communication nodes involvedin the processing and transport of data used to generate theseinterfaces from various sources to an end user's device, e.g., atelevision. As will be described in more detail below, this dataincludes so-called metadata that describes the media content. The term“metadata” as it is used herein refers to all of the supplementaryinformation that describes the particular content of interest associatedwith media items available for selection by a user. As an example formovie objects, the metadata could include, e.g., the title, description,genre, cast, DVD cover art, price/availability, cast bios andfilmographies, links to similar movies, critical reviews, user reviews,the rights associated with the metadata itself, rights associated withthe content, advertising metadata linked to the content of interest,etc. An exemplary system for capturing, processing, synthesizing andforwarding metadata suitable for such advanced user interfaces isdescribed in U.S. patent application Ser. No. 11/037,897 entitled “AMetadata Brokering Server and Method”, filed on Jan. 18, 2005, thedisclosure of which is incorporated here by reference.

Once captured and processed, however, the data needs to be communicatedfrom, for example, a head-end portion of the system to, for example, aset-top box in a manner which enables sufficient data to be supplied torender rich user interfaces, while at the same time being sensitive totime delay and operating within the constraints imposed by legacyhardware. Accordingly, it would be desirable to provide architecturesand methods which resolve these conflicting parameters and enableadvanced user interfaces to be generated.

SUMMARY

Exemplary embodiments of the present invention provide methods andsystems for communicating and processing data in communication networks,e.g., cable networks and/or interactive television networks. Selectiveuse of different data streams and encoding techniques enablesophisticated user interfaces to be generated on client devices havingvarying processing capabilities.

According to one exemplary embodiment of the present invention, a methodfor transmitting data from an upstream node to a client device in acable communication network includes the steps of selectivelyidentifying data to be transmitted from the upstream node to the clientdevice as either first data or second data, encoding the first datausing MPEG encoding, transmitting the MPEG encoded data via an MPEG datastream to the client device, encoding the second data using a secondtype of encoding which is different than MPEG encoding and transmittingthe encoded second data using a second data stream to the client device.

According to another exemplary embodiment of the present invention, amethod for generating a hoverzoom effect on a user interface includesthe steps of transmitting background layer data and foreground data to aclient device, displaying the background layer, identifying a useraction associated with the hoverzoom effect, displaying, in response tothe user action, said foreground layer as an overlay on the backgroundlayer.

According to yet another exemplary embodiment of the present invention,a method for MPEG encoding data to be transmitted from an upstream nodeto a client device includes the steps of estimating motion vectorsassociated with a user interface, sending the motion vectors to an MPEGencoder, and MPEG encoding the data to be transmitted using theestimated motion vectors.

BRIEF DESCRIPTION OF THE DRAWINGS

The accompanying drawings illustrate exemplary embodiments of thepresent invention, wherein:

FIGS. 1( a) and 1(b) depict screens of a user interface showing ahoverzoom feature which can be generated using data processed inaccordance with the present invention;

FIG. 2 depicts another screen of a user interface which can be generatedusing data processed in accordance with the present invention;

FIG. 3 is a table showing exemplary metadata types and sources;

FIG. 4 shows a client-server architecture according to exemplaryembodiments of the present invention;

FIG. 5 illustrates the MPEG-2 transition and scene encoder of FIG. 4 inmore detail in accordance with an exemplary embodiment of the presentinvention;

FIG. 6 illustrates the scene request processor of FIG. 4 in more detailin accordance with an exemplary embodiment of the present invention;

FIG. 7 illustrates the client UI state machine of FIG. 4 in more detailin accordance with an exemplary embodiment of the present invention;

FIG. 8 depicts an exemplary messaging interaction between an eventprocessor, scene loader, exclusive scene and overlay scene in accordancewith an exemplary embodiment of the present invention;

FIG. 9 shows another exemplary messaging interaction associated witharchitecture and methods in accordance with the present invention.

FIG. 10 depicts a technique for encoding data associated with ahoverzoom effect according to an exemplary embodiment of the presentinvention; and

FIG. 11 illustrates selective encoding of data for transmission to aclient device according to an exemplary embodiment of the presentinvention.

DETAILED DESCRIPTION

The following detailed description of the invention refers to theaccompanying drawings. The same reference numbers in different drawingsidentify the same or similar elements. Also, the following detaileddescription does not limit the invention. Instead, the scope of theinvention is defined by the appended claims.

In order to provide some context for this discussion, exemplary userinterface screens which can be created using data and instructionsforwarded from a server to a client in accordance with exemplaryembodiments of the present invention are shown in FIGS. 1( a) and 1(b).Therein, a portion of an exemplary user interface screen which can begenerated based on information transferred to an end user's system(e.g., set-top box/television or personal computer) shows ten mediaselection items. For more information regarding this purely exemplaryinterface, including previous screens and navigation techniques, theinterested reader is directed to the above-incorporated by referenceU.S. patent application Ser. No. 10/768,432. It will be appreciated thatsuch user interfaces are purely exemplary and that architectures andmethods in accordance with the present invention can be implemented tosupport other interfaces.

FIG. 1( a) shows a user interface screen having a plurality of mediaobjects available for selection as images, e.g., DVD cover art. In FIG.1( b), the image associated with the movie “Apollo 13” has beenmagnified as a result of a preliminary selection activity, e.g., a userpassing a cursor (not shown) over this image on the display screen. Thisfeature, referred to as a hoverzoom effect and described in more detailbelow under the heading “Hoverzoom”, can be achieved by transmittingdata (e.g., metadata) and instructions between nodes, e.g., a headendand a set-top box according to exemplary embodiments of the presentinvention. At lower levels of the user interface, additional data, e.g.,metadata delivered from content providers, can be used to generate theuser interface screen. For example, as shown in FIG. 2, user selectionof this magnified image, e.g., by depressing a button on an input device(not shown), can result in a further zoom to display additional details.For example, information about the movie “Apollo 13” including, amongother things, the movie's runtime, price and actor information is shown.Those skilled in the art will appreciate that other types of informationcould be provided here. Additionally, this GUI screen includes GUIcontrol objects including, for example, button control objects forbuying the movie, watching a trailer or returning to the previous GUIscreen (which could also be accomplished by depressing the ZOOM OUTbutton on the input device). Hyperlinks generated from metadataprocessed in a manner described below can also be used to allow the userto jump to, for example, GUI screens associated with the related moviesidentified in the lower right hand corner of the GUI screen of FIG. 2 orinformation associated with the actors in this movie. In this example,some or all of the film titles under the heading “Filmography” can beimplemented as hyperlinks which, when actuated by the user via the inputdevice, will cause the GUI to display a GUI screen corresponding to thatof FIG. 2 for the indicated movie. Some or all of the information usedto generate the interface screens of FIGS. 1( a), 1(b) and 2 comes frommetadata provided by one or more metadata providers and processed inaccordance with exemplary embodiments of the present invention as willnow be described.

The interface screens shown in FIGS. 1( a), 1(b) and 2 are purelyexemplary and metadata (and other data) transferred and processed inaccordance with the present invention can be used to support otherinterfaces or for purposes other than interface generation. Likewise,many different types of information can be received and processed inaccordance with the present invention. Examples of metadata types,sources and associated uses, e.g., for a TV browser interface, avideo-on-demand (VOD) interface or a music browser, are shown in thetable of FIG. 3. Of particular interest for this detailed discussion arethe zooming features associated with user interfaces generated inaccordance with these exemplary embodiments of the present invention.Although the present invention is not limited to techniques or systemsfor generating zoomable user interfaces, some of the client/serverfeatures discussed herein are particularly beneficial for use inconjunction with user interfaces which include zooming transitionsbetween user interface screens. For the purpose of this detaileddescription, the terms “zoom”, “zoomable” and “zooming” refer totechniques wherein a user interface action results in changes to thedisplayed portion of the user interface that a creates a change ofperspective which is consistent and informative to the user. Zoomingwill typically include changes in object magnification (e.g.,camera-style zooming), but is expressly not limited thereto. Forexample, another aspect of zooming in accordance with user interfaces issemantic zooming which includes the modification of a zoomed object in amanner which is independent of magnification, e.g., the addition of textor a graphic to an object which was not present as part of the object(at any level of magnification) prior to the semantic zoom. For moreinformation related to zoomable user interfaces, the interested readeris referred to the above-identified, incorporated by reference patentapplication.

For context, one example of a zooming transitions in accordance withexemplary embodiments of the present invention is the zooming transitionbetween the user interface screen of FIGS. 1( a) and 1(b), whichinvolves a magnification change of a hoverzoomed object and, optionally,semantic zooming to that object as well. Another example is found in thetransition between the user interface screen of FIG. 1( b) and FIG. 2,wherein the image associated with “Apollo 13” has its magnificationchanged (e.g., enlarged in FIG. 2 relative to the similar image shown inFIG. 1( b)) and translated for use in FIG. 2. Panning effects can alsobe used to animate the zooming transition.

A general client-server architecture 40 for providing data processingand transport according to an exemplary embodiment of the presentinvention is shown in FIG. 4. Therein, a user interface server 42communicates with a client device 44 to generate a user interface on adisplay device 46 in conjunction with inputs from, for example, apointing device 48. Communication of data, e.g., metadata and contentdata, between the user interface server 42 and the client device 44 caninvolve any number of intermediate nodes (not shown) between the userinterface server 42 and the client device 44 including hubs,distribution servers, and the like. Moreover, some or all of thefunctional elements illustrated as being part of the user interfaceserver 42 can be located within one or more of these intermediate nodesor reside at the headend of the system 40. The display device 46 can,for example, be a television, a computer monitor/display, or any otherdisplay device. The client device 44 can be embodied as a set-top box, apersonal computer, or any other device including a processing unit. Thepointer 48 can, for example, be a three-dimensional (hereinafter “3D”)pointing device, a mouse, a remote control device, a track ball, ajoystick, or any other device capable of providing a pointing capabilityand can be connected to the client device 44 either via wireline orwirelessly.

According to this exemplary embodiment of the present invention, theserver 42 includes a transition and screen capturer 50, an MPEG-2transition and scene encoder, an MPEG and ZSD cache 54, a scene requestprocessor 56 and an MPEG stream transmitter 58, which components operateto generate and manage the streaming of MPEG-2 data to client devices44, and to receive and respond to upstream requests from clients 44. Thetransition and screen capturer 50 automates the gathering of scene dataused to generate the user interface. At a high level, this can beaccomplished by navigating through, e.g., a scene graph provided asinput to the transition and screen capturer 50, along with metadata andcontent, and calling the MPEG-2 transition and scene encoder 52 togenerate MPEG-2 clips and scene description files associated withselected scenes to be displayed on display device 46. Detailedinformation associated with scene description files and formats (alsoreferred to herein as “ZSD data”) according to exemplary embodiments ofthe present invention is provided below under the header “SceneDescription Data Format”.

Navigation through the scene graph involves capturing and processingdata associated with the various scenes which can be generated by theuser interface. A “scene” as that term is used herein generally refersto the framework associated with any user interface screen which can begenerated by the user interface which, despite the sophisticated anddynamic nature of user interfaces in accordance with the presentinvention, are all known a priori albeit at least some of the data usedto populate the scenes will vary, e.g., over time as content providerschange, for example, metadata associated with their offerings. Thus,although FIGS. 1( a), 1(b) and 2 show only portions of user interfacescreens, each of those complete screens would be considered to be ascene. Table 1 below lists exemplary data which can be collected foreach transition and Table 2 lists exemplary data for each scene:

TABLE 1 Per-Transition Information Field Description From Scene ID Thescene ID of the starting scene To Scene ID The scene ID of thedestination scene Focus Command The command to move the focus ininterface to the icon, button, etc. that causes the transition whenselected. An example of a focus command is to move the mouse pointerover an icon to cause it to focus. Another focus command could directlyactivate a hoverzoom effect. Activation This command activates the icon,button, etc. to Command start the transition from the “From Location” tothe “To Location”.

TABLE 2 Scene Information Field Description Scene ID The scene ID of thethis scene Location The interface location instance for the startingscene Scene Description The user supplied description or anautomatically generated description.

The transition and scene capturer 50 is thus able to acquire all of theinformation necessary to simulate all desired transitions in the userinterface from, for example, a database not shown in FIG. 4 whichcontains the complete user interface “universe”. The transition andscene capturer 50 includes navigator controller and capture controllercomponents which become active as a user generates inputs to theinterface which command scene transitions. At a high level, thenavigation controller has the responsibility of navigation to and fromevery transition and scene. An exemplary navigation controller performsthe following operations, (1) obtain the next transition, (2) navigateto the “from” scene, (3) execute a focus command for this transition,(4) notify the capture controller with the scene and transitioninformation, (5) execute the activation command, (6) notify the capturecontroller when the animation completes, (7) notify the capturecontroller with the scene and transition information reversed (for theback transition), (8) invoke a goBack( ) routine, and (9) notify thecapture controller when the animation completes.

The capture controller integrates with the MPEG-2 transition and sceneencoder 52 to create the MPEG-2 clips and ZSD files. The capturecontroller receives notifications from the navigation controller whenthe transition begins and ends and invokes routines on the MPEG-2transition and scene encoder at every animation step. To provide avisual indication of the progress to the user, the capture controllerensures that the canvas still paints the visible scene graph to thescene and adds a text overlay that indicates the percent of transitionsexecuted.

A detailed example of an MPEG-2 transition and scene encoder 52according to an exemplary embodiment of the present invention is shownin FIG. 5. Raw scene data, e.g., images, text, metadata, etc., isdelivered from the transition and screen capturer 50 and provided to anobject extraction unit 502, a client-rendered feature extraction unit504 and a video information extraction unit 506. The object extractionunit 502 (handling user-interactable objects on the user interfacescreens) and client-rendered feature extraction unit 504 (handling,e.g., hoverzoom and text, features to be rendered by the client device44) operate, under the control of the render-location controller 508, toextract information from the raw data stream and provide it to the ZSDencoder 507, which encodes the extracted information using the scenedescription format described in detail below. None, some or all of theZSD encoded data can be sent within the MPEG data stream, for example aspart of the private data fields within MPEG frames, using MPEG-2 dataencapsulator 509, while other ZSD encoded data can be transmitted usingthe OOB link described above with respect to FIG. 4.

The video information extraction unit 506 operates to extract videoinformation suitable for MPEG-2 encoding, again under the control of therender location controller 508. The ability of render locationcontroller 508 to selectively determine which type of encoding to applyto particular data, in this example MPEG or ZSD encoding, and thebenefits associated therewith are described in more detail below withrespect to FIG. 11.

As used herein, the term “MPEG encoding” is generic to MPEG-1, MPEG-2and similar encodings, although some exemplary embodiments of thepresent invention do specifically refer to MPEG-2 encoding. Generaldetails associated with MPEG encoding per se will be known to thoseskilled in the art and are further available in the form of draftstandards (e.g., ISO CD 11172). An exemplary MPEG-2 encoder 500 includesa plurality of unnumbered blocks which operate in accordance with thestandard to perform MPEG-2 encoding (an exception being motionestimation unit 510 described in more detail below). One example of anMPEG encoder which provides a more detailed description of theunnumbered blocks of MPEG encoder 500 can be found in the various MPEG-2standards documents, for example, Test Model 5 documents which evolvedas a joint effort between ITU-T SG15.1 (known then as CCITT SG XV,Working Party XV/1, Experts Group on ATM Video Coding) and ISO/IECJTC1/SC29 WG11 (MPEG). Specifically, the MPEG version of Test Model 5 isknown as MPEG 93/225b and the ITU version of Test Model 5 is known asAVC-445b, the disclosures of which are incorporated here by reference.MPEG encoded data is stored in the MPEG/ZSD cache unit 54 for subsequenttransmission to the client device 44.

Of particular interest with respect to the exemplary MPEG-2 transitionand scene encoder 52 illustrated in FIG. 5 is the encoder hint collector512 and motion estimator 510. One aspect of MPEG-encoder 500 in theMPEG-2 transition and scene encoder 52 is its ability to quickly andefficiently provide a high level of compression of the MPEG data beingencoded. Among other things, this can be achieved by using knowledge ofwhere each of the scenes are “located” relative to one another in theuser interface, which is defined a priori in exemplary user interfacesaccording to the present invention. This enables selectivesimplification of the standard MPEG motion estimation algorithm, whichin turn speeds up the MPEG encoding process and/or reduces the amount ofprocessing power that needs to be dedicated thereto. More specifically,when encoding sequential MPEG frames in an MPEG data stream, part of theinformation that is used to perform the encoding is informationregarding where blocks of pixels have moved from one MPEG frame to thenext MPEG frame (and/or backwards from a previous MPEG frame to acurrent MPEG frame). For example, if a block of pixels in a first MPEGframe has simply moved to a new screen location in a second MPEG frame,it is generally more efficient to determine and transmit a motion vectorassociated with that block of pixels than to re-encode that entire blockof pixels again and resend them. Similarly, if that block of pixels hasexperienced a relatively uniform color difference (e.g., by transitingthrough a lighting effect), it is still efficient to provide a motionvector and some color difference information rather than retransmit theentire block of pixels.

In order to accommodate random object movement to support all types of,e.g., video data compression, standard MPEG motion estimation algorithmsperform a search for blocks of pixel data determine which blocks ofpixels have moved (and in which direction) from frame to frame. Forexample, some searches, call full pel searches, use 16×16 blocks, whileothers, called half-pel searches, use 16×8 blocks. These searches canbecome computationally expensive, particularly for high definition videodata, and has been estimated to require up to 80% of the processingtime/power associated with the operations performed by a standard MPEGencoder 500 (e.g., without the modifications introduced by the encoderhint collector 512). Thus, according to exemplary embodiments of thepresent invention, motion estimation associated with MPEG encoding issimplified using the fact that the user interface being generated bythese client/server architectures does not involve random movement ofobjects. For example, in transitioning between the exemplary userinterface screens of FIGS. 1( b) and 2, the image associated with“Apollo 13” moves from a first position on a display screen to a secondposition on a display screen (optionally with some magnification), bothpositions being known a priori to the encoder hint collector 512, whichcan calculate an MPEG motion vector therefrom.

Thus, the encoder hint collector 512 can pass the MPEG motion vector tomotion estimation unit 510 with a command to use the passed motionvector for performing MPEG compression rather than performing a searchin accordance with standard MPEG techniques. However, this use ofknowledge of interrelated user interface screens to generate MPEG motionvectors may not always be able to generate a valid MPEG motion vector(e.g., due to limitations on the number of bits assigned for expressingMPEG motion vectors). Accordingly, encoder hint collector 512 also hasthe capability to command motion estimation unit 510 to employ thestandard MPEG search algorithm to determine motion vectors on aframe-by-frame (or other) basis. In addition to either (1) using motionvectors which are generated entirely using the standard MPEG searchalgorithm or (2) using motion vectors which are generated entirely bythe encoder hint generator 512 without use of the standard MPEG searchalgorithm, a third category of motion vectors which can be determined inaccordance with the present invention are those which are calculated bythe standard MPEG search algorithm having a search range which islimited in range based on the information available to the encoder hintcollector 512.

Referring back again to FIG. 4, MPEG data and scene description datagenerated by blocks 50 and 52 can be cached in memory device 54 forretrieval as needed by the scene request processor 56. The scene requestprocessor 56 processes requests for scenes from client 44, e.g., if theclient user interface state machine 62 receives an indication that thecursor associated with pointer 48 has paused over the image associatedwith “Apollo 13” (FIG. 1), then a request is sent back to scene requestprocessor 56 to initiate a hoverzoom scene (described below) or if theclient user interface state machine 62 receives an indication that theuser wants to view a more detailed scene associated with “Apollo 13”(FIG. 2), then a request is sent back to scene request processor 56 toinitiate that scene. The scene request processor 56 returns MPEG-2transitions and scene description data back to the client 44 in responseto the upstream requests. According to exemplary embodiments describedin more detail below, for certain upstream requests the scene requestprocessor 56 may dynamically determine whether MPEG data, scenedescription data or some combination of both is appropriate to servicethe requests. A detailed example of the scene request processor 56 isillustrated in FIG. 6.

Therein, the client request processor 600 coordinates all clientinteraction, e.g., by interpreting client requests and dispatching thoserequests to the appropriate components within scene request processor56. For example, the client request processor tracks states andstatistics on a per-client basis and stores such information in database602. An out-of-band (OOB) client communication component 604 handles allcommunication with clients over OOB channels, including responding toconnection requests and extracting protocol requests. The video playbackcontrol function 606 coordinates the operation of the MPEG-2 streamgeneration components, e.g., the scene loop generator 608 and thetransition playback function 610. The scene loop generator 608 componentgenerates loops of the user interface scenes and transmits them when notransitions occur. The transition playback function 610 loads MPEG-2transition streams that were previously generated by the MPEG-2transition and scene encoder 52 (e.g., via cache 54) and streams them tothe requested client. The transition playback function 610 may servemultiple streams simultaneously. The MPEG-2 transport streamencapsulation unit 612 updates the MPEG-2 transport stream asappropriate and forwards the stream to the UDP encapsulation unit 614which groups MPEG-2 transport stream packets together and sends themover UDP to a IP to QAM gateway (not shown) in the MPEG streamtransmitter 58.

Referring again to FIG. 4, MPEG stream transmitter 58, on the serverside, and MPEG stream receiver 64 and MPEG decoder 66, on the clientside, enable the communication of both metadata, e.g., data used topopulate the text fields shown in the user interface screen of FIG. 2,and content via a video streaming protocol link. The MPEG transmitter58, receiver 64 and decoder 66 can be implemented using off-the-shelfcomponents and, accordingly, are not described in detail herein. Howeverreaders interested in more details relating to these elements, as wellas other exemplary interactive television system architectures in whichthe present invention can be implemented, are referred to U.S. Pat. No.6,804,708 to Jerding et al., the disclosure of which is incorporatedhere by reference. The on-screen display (OSD) graphics controller 68receives data scene data from the client state machine 62 and input fromthe cursor controller 69 to generate overlay graphics and localanimations, e.g., zooming transitions, for the user interface. The MPEGvideo data and the OSD video data output from decoder 66 and OSDgraphics controller 68, respectively, are combined by video combiner 70and forwarded to display device 46 to generate the user interface. Asmentioned above, the DVD cover art images shown in FIG. 1( a) areexamples of user interface elements created using MPEG video data, whilethe zoomed version of the “Apollo 13” image in FIG. 1( b) and thecircular icons in the upper right hand corner of the user interfacescreen of FIG. 1( a) are examples of user interface elements generatedusing scene description data.

Of particular interest for exemplary embodiments of the presentinvention is the client user interface state machine 62, a more detailedexample of which is provided in FIG. 7. The client user interface statemachine 62 interprets scene data and/or scripts received from the scenerequest processor 56 to present user interface scenes (e.g., as shown inFIGS. 1( a), 1(b) and 2) on client devices 44. The client user interfacestate machine 62 can also retrieve scene data and MPEG-2 transitionclips from either the headend 42 (as represented by block 700) or from alocal hard disk drive 702. Those skilled in the art will appreciatethat, depending upon the system and/or type of client device involved,that only one data source 700, 702 may be present in a particularimplementation of the present invention or that some other type of datasource can be used. Out-of-band (OOB) communications 704 can be used toprovide signaling and commands to the client user interface statemachine 62 via an operating system (OS) 706, e.g., PowerTV, Linux,Win32, etc., and operating system portal layer 708. The OS and OSporting layer 706, 708 can also track the user's activities with respectto the user interface and provide data to an event mapper function 710.Event mapper 710 translates user interface data, e.g., cursor movement,voice command input, motion of 3D pointer, etc., into events which mayrequire some change in the user interface, e.g., display change, audiochange, zooming transition, etc. For example, when the user's cursorhovers over or passes over the image of “Apollo 13” in FIG. 1( a), theevent mapper 710 would receive raw cursor data from the OS and map thatinto, for example, a hoverzoom event which results in that image beingslightly magnified as illustrated in FIG. 1( b) and described in moredetail below. As another example, if the OS 706, 708 passed a buttonclick through to the event mapper 710 while the cursor was positionedover the magnified version of the “Apollo 13” image in FIG. 1( b),indicating that the user wanted more detail regarding this movie, thenthe event mapper 710 could identify a “transition to detailed viewevent” associated therewith, leading to a transition to the userinterface screen of FIG. 2.

Events detected by event mapper 710 are queued in the event queue 712for processing by event processor 714. The event processor 714coordinates the activities of the client user interface state machine 62by receiving events from the event queue 712 and dispatching them to theaction library 716 based on, for example, the currently active scenedata and/or script. The action library 716, in conjunction with a scenedata loader 720 and various storage units 718, 722, operates to generatethe change(s) to the currently displayed user interface screen based onthe detected event as will be described in more detail below withrespect to the discussion of scene data.

Scene Description Data Format

Having described some exemplary server/client architecture forgenerating user interfaces according to exemplary embodiments of thepresent invention, a second exemplary data format (in addition toMPEG/MPEG-2) which can be used in conjunction with this architecturewill now be described. Although other data formats can be used inconjunction with the present invention, this exemplary data formateffectively creates a state machine that enables the client device 44 torespond to user interactions and system events. This data format isarbitrarily extensible to support both very low powered client devices44 and high end client devices 44, e.g., PCs. Other goals of thisexemplary scene data format (also referred to as “ZSD”) include themesupport, future language support, demo scripting, and automated testsupport.

The ZSD format supports two types of scenes: the exclusive scene andoverlay scenes. Herein, the exclusive scene is referred to simply as thescene, since it occupies the full screen and contains the primary userinteraction elements. Overlay scenes describe full or partial scenesthat the client user interface state machine 62 logically overlays ontop of the exclusive scene. While the exclusive scene changes as theuser navigates, the overlay scenes may or may not change. This enablesthem to support features such as music controls, global navigation,bookmarks, etc., that follow the user as they navigate from exclusivescene to scene. Exclusive scenes launch overlay scenes initially, butoverlay scenes may launch other overlays. Although it is possible toterminate all overlay scenes, the overlay scenes control their ownlifetime based on interaction from the user or based on the currentexclusive scene.

The exclusive scene and all overlay scenes logically exist in their ownnamespaces. In order for ZSD elements to refer to elements in otherscenes, ZSD references as described herein could be modified to includea field to specify the namespace. Inter-scene communication is usefulfor operations such as notifying overlay scenes what is in the exclusivescene. To support inter-scene communication, the sender triggers actionsto generate events. These events are then dispatched by the eventprocessor 714 to each scene. When the event contains a Resource ID, thatID is mapped to an equivalent resource in the destination scene. If thedestination scene does not contain an equivalent resource, the eventprocessor 714 moves on to test dispatching the event to the next scene.

Every exclusive scene passes through the following states sequentiallyon the client, (1) Entered, (2) Loaded, (3) Steady State, (4) Unloadingand (5) Exited. When the exclusive scene's ZSD data is initiallydecoded, the scene enters the Entered state. At this point, the eventprocessor 714 fires the OnLoad event so that the exclusive scene canperform any initial actions. Once the event processor 714 completes theOnLoad event dispatch process, the exclusive scene enters the Loadedstate. At this point, the event processor 714 may have pending events inits queue 712. The event processor 714 clears out this queue 712 andthen transitions the exclusive scene to its Steady State. FIG. 8illustrates an exemplary exclusive scene life cycle using scenemembership messaging to show event processing in all states. The processfor unloading an exclusive scene is essentially the reverse of the loadprocess. For this case, a GoToScene or other scene-changing actioninitiates the unload process. At this point, the exclusive scene changesto the Unloading state. Once all ZSD unload processing completes, theprocess transitions to the Exited state, wherein the client mayoptionally retain some or all of the exclusive scene's ZSD data. Thechanges in the exclusive scene's state are communicated to all currentlyloaded overlay scenes so the overlay scene can take action (if needed).

Overlay scenes exist independent and on top of the exclusive scene. Forexample, in FIG. 1( a) the three icons depicted in the upper righthandcorner (home, up arrow and TV) can be implemented as overlay scenes onthe exclusive scene (the images of various DVD covers, implemented inthe MPEG layer). Another example, not shown in FIGS. 1 and 2, is theprovision of volume control and/or channel selection user interfaceobjects as overlay scenes. Termination of an overlay scene can beaccomplished from within the scene itself, or by request from theexclusive scene. Additionally, SceneMembershipNotification events can beused to limit the lifetime of an overlay scene to a particular set ofexclusive scenes as shown, for example, in FIG. 9. Each of the exclusivescenes that belong to this scene group would send aSceneMembershipNotification message when they are loaded. The overlayscene associated with this scene group would use theExclusiveSceneChange events and the SceneMembershipNotification messageto tell if the overlay scene should stay loaded or should terminateitself. As long as it receives a SceneMembershipNotification thatmatches its Scene Group, the overlay screen can stay loaded. Tripletables (mentioned in FIG. 9) are described in more detail below.

According to one exemplary embodiment of the present invention, eachscene contains the following descriptive information:

TABLE 3 Scene Information Fields Field Description Scene ID A globallyunique ID for this scene Description An optional string description tohelp identify this scene to a developer SceneDimension The dimensionsused to layout the scene ZSD Format Version This field has the integervalue one. ZSD Profile This field is the name of the minimally supportedprofile. Currently it can take on the value “Simple” and “Advanced”.Maximum Action Stack Size This field specifies the maximum number ofelements that may be pushed onto the Action Stack for this scene. CacheProperty Type This field specifies how a ZSD interpreter may cache thisscene. Cache Property Value This field can be used to specify a 32 bitinteger value based on the Cache Property Type. It should be set to 0 ifunused.In order to improve ZSD load time performance, a client device 44 mayoptionally implement a ZSD cache 722. ZSD-encoded scenes specify cachingproperties to direct clients when the caching behavior is no longeruseful. For example, temporally important information such as sportsscores should not be cached for a long period of time. Table 4 listsexemplary caching properties types and describes their use.

TABLE 4 Cache Properties Cache Property Property Type Description ValueUnits Timeout Time out this scene after the specified Seconds number ofseconds. (0 seconds implies no caching)

An exemplary scene data format according to the present invention hasfour fundamental data types (sometimes referred to herein as“elements”), specifically objects, events, actions, and resources. At ahigh level, objects describe scene components such as the bounds forbuttons and icons in the MPEG layer, overlay text, and overlay images.Events describe the notifications that are pertinent to the scene. Theseinclude mouse (pointer) move events, keyboard events, application statechange events, etc. Actions describe responses to events such as goingto another scene, and finally, resources contain the raw data used byobjects, events, and actions, e.g., image data. Each of these data typesare described in more detail below.

Exemplary object types and parameters associated therewith (including anoptional set of properties) according to an exemplar embodiment of thepresent invention are described in tables 5-8.

TABLE 5 Object Types Object Type Value Parameters Description WholeScene0 None The whole scene object, OID 0, has this type. Bounds 1 X, Y,Width, This object specifies a rectangular Height bound in the scenecoordinate system. PNode 2 X, Y, Width, This object specifies a PNodeHeight, Parent with the specified bounds Object

TABLE 6 Reserved Object IDs Name Object ID Type Description WholeScene 0WholeScene The whole scene Reserved 1-63 N/A Reserved

TABLE 7 Object Type Support Object Type Simple Profile Advanced ProfileWholeScene ✓ ✓ Bounds ✓ ✓ PNode x ✓

TABLE 8 Object Properties Property Type Parameters Required For:Optional For: Cursor Cursor Resource ID WholeScene Bounds, PNode

Like the other scene description format elements, each event is assigneda globally unique value. Some event types employ filters to constrainthe actions that they would trigger. For example, the OnKeyPress eventuses the key of interest. In addition to filters, events can pushresources onto the action stack, described below. Actions may use theinformation on the stack to modify their behavior.

Exemplary event types are listed in Table 9 below. Overlay scenes affectthe propagation of events by the dispatcher. Dispatch semantics areabbreviated in the table as follows:

-   1. Active—the dispatcher sends the event only to the active scene.    For example, when a scene is loaded, the OnLoad event only gets sent    to that scene.-   2. Scenes with Resource Filters—the dispatcher only sends these    events to scenes that contain Resource Table entries for the event.    Before iterating through a scene's triple table, the event    dispatcher remaps the Resource IDs in the event to their equivalents    in the scene.-   3. Overlays Only—the dispatcher only sends these events to overlay    scenes.-   4. Both—the dispatcher first sends this event to the overlay scenes    and then to the exclusive scene

TABLE 9 Event Types Event Type Value Semantics Filter Action StackDescription OnLoad 0 Active None None This event gets sent when theobject gets loaded. OnKeyPress 1 Both Key Key This event gets sent whenthe user presses a key or remote control button. OnKeyRelease 2 Both KeyKey This event gets sent when the user releases a key or remote controlbutton. OnKeyTyped 3 Both Key Key This event gets sent when the usertypes a key. If the key supports auto- repeat, the system sends thisevent repeatedly while the key is down. OnMouseEnter 4 Both None NoneThis event gets sent when the mouse pointer goes over the object.OnMouseExit 5 Both None None This event gets sent when the mouse pointerexits the bounds of the object. OnMousePress 6 Both Button X, Y, ButtonThis event gets sent when the user presses a mouse button.OnMouseRelease 7 Both Button X, Y, Button This event gets sent when theuser releases a mouse button. OnMouseClick 8 Both Button X, Y, ButtonThe event gets sent when the user presses and releases a mouse button.OnFocusIn 9 Both None None This event gets sent when the associatedobject receives focus. Other events generally cause focus such as keypresses and mouse enter. OnFocusOut 10 Both None None This event getssent when the associated object loses focus. OnSceneMembers 11 ScenesSceneMembership SceneMembership This event gets sent whenhipNotification with Resource Resource ID a NotifySceneMembershipResource ID action gets fired. Arguments OnScrollUp 12 Both Wheel WheelThis event gets fired for every notch that the specified scroll wheelmoves up. OnScrollDown 13 Both Wheel Wheel This event gets fired forevery notch that the specified scroll wheel moves down. OnTimeout 14Both Timer Timer This event gets fired when a timer expires. OnActivate15 Both None None This event gets fired when an object gets activated.OnExclusiveScene 16 Overlays Entered, None This event gets fired whenChange Only Loaded, the exclusive scene Unloading, changes. The argumentExited specifies the exact moment in the scene change. See the scene thescene life cycle sequence diagram. OnUnload 17 Both None None This eventgets fired when an object gets unloaded as the result of a scene change.

In operation of the architectures and methods described herein, theresult of an event on an object is an action. Actions may be linkedtogether in a ZSD Action Table to form programs. To facilitate parameterpassing to actions from events and to linked actions, a ZSD interpretermaintains an action stack. The action stack is initialized beforedispatching the first action in an action list with the following itemsin order:

-   1. The object in the triple table entry that triggered the action-   2. The event in the triple table entry that triggered the action-   3. Elements pushed onto the action stack from the event    Before dispatching each action, the ZSD interpreter logically pushes    the parameters of the action onto the stack. Implementations may    short-circuit this behavior on built-in actions for simplicity. Each    action type specifies its use of the stack. In general, a ZSD    interpreter will only be able to allocate a small action stack (e.g.    16-32 elements), so stack usage should be kept to a minimum. To    ensure that the ZSD interpreter always has a sufficient stack, the    ZSD encoder must specify the maximum stack size in the header. All    action types should avoid recursion to simplify the maximum stack    size calculation. Exemplary action types are listed below in Table    10.

TABLE 10 Action Types Action Stack Post Stack Action Type ValueParameters Inputs Outputs Delta Description NoAction 0 None None None 0This action is a NOP. GoToScene 1 Scene ID, Parameters None −2 Thisaction causes Duration the client to animate to a new location in thespecified time. If the server context buffer has information, thiscommand bundles the context with the scene navigation request.NavigateBack 2 Count Parameters None −1 Navigate the specified number ofscenes back in history. If the history does not contain that manyscenes, it navigates back as far as possible. If the server contextbuffer has information, this command bundles the context with the scenenavigation request. NavigateForward 3 Count Parameters None −1 Navigatethe specified number of scenes forward in history. It the history doesnot contain that many scenes, it navigates forward as far as possible.If the server context buffer has information, this command bundles thecontext with the scene navigation request. NavigateHome 4 None None None0 Navigate to the home scene. If the server context buffer hasinformation, this command bundles the context with the scene navigationrequest. NavigateUp 5 Count, Parameters None −2 Navigate to the Durationscene that is geographically up n times in the specified time. If theserver context buffer has information, this command bundles the contextwith the scene navigation request. StartTimer 6 Timer, Parameters None−2 Start a timer that Duration sends a timeout event in the specifiedduration. Timers are global to the scene. StopTimer 7 Timer ParametersNone −1 Stop the specified timer. StartHoverZoom 8 X, Y, Width,Parameters None −7 Hoverzoom to the Height, end coordinates (x, Resourcey, width, height) ID, Duration over the specified duration, using theResource ID associated with a HoverZoomPixelData resource to create theHoverZoom. StopHoverZoom 9 Duration Parameters None −1 Stop thehoverzoom over the specified number of millisecond Focus 10 Object IDParameters None −1 Force the focus to change to the specified object.ChangePointer 11 Resource Parameters None −2 Change the pointer ID,Object to that specified by ID the Resource ID when over the objectspecified by the Object ID. ChangePointerVisibility 12 Visible,Parameters None −2 True to show the Duration pointer; false to hide it.Animate for specified duration. MovePointer 13 X, Y, Parameters None −3Move the pointer to Duration the specified location over the specifiedduration. Activate 14 Object ID Parameters None −1 Activate thespecified object. PushServerContext 15 Resource ID Parameters None −1Push the specified resource for transmission back to the server.ReportServerContext 16 None None None 0 Report the gathered context tothe server. If no pending context, then this action is ignored. Afterthe report, this command clears the context buffer. CreateTextObject 17Object ID, Parameters None −2 Show the text Resource ID object specifiedby the Resource ID using the Object specified by the Object IDCreateImageObject 18 Object ID, Parameters None −2 Show the imageResource ID specified by the Resource ID using the Object specified bythe Object ID NotifySceneMembership 19 SceneMembership Parameters None−2 Notify scene Resource ID membership. This is usually done in responseto an OnLoad event. StartOverlayScene 20 Overlay Parameters None −2 Loadand start the Scene specified overlay Resource ID scene.TerminateOverlayScene 21 None None None 0 Terminate the current overlayscene. Triggering this action from the main scene does nothing.TerminateAllOverlayScenes 22 None None None 0 Terminate all overlayscenes. This action is useful for resyncing client and server state.SetActiveTripleTable 23 Triple Table Parameters None −1 Set the activeTriple Index Table. Index 0 is the set by default. RunScript 24 ResourceID Parameters 0+ Arbitrary Interpret the specified script

Exemplary resources which can be used in conjunction with the presentinvention are listed below in Table 11.

TABLE 11 Resource Types Resource Type Value Parameters DescriptionUTF8String 0 UTF8String This resource type holds string characters fromthe UTF8 character set. The string may not exceed 256 characters.UnicodeString 1 UnicodeString This resource type holds Unicodecharacters. The string may not exceed 256 characters.MPEG2TransitionClip 2 Scene ID, This resource type points to an MPEG-2clip Scene ID, file for the transition between the two scenes. MPEG-2clip Scenes list all of the MPEG-2 clips for clients with hard disksupport or for servers. These clips may change based on the currenttheme. Cursor 3 Image This resource holds the cursor image. Image 4Image This resource holds an image. HoverZoom 5 PixMask, This resourceholds the image data for FGTransPix, creating a hoverzoom. FGOpaquePix,BGPix SceneMembership 6 UTF8String This resource identifies a scene'smembers such as belonging to a application. OverlayScene 7 Scene Thisresource holds an embedded ZSD description for an overlay scene.

According to an exemplary embodiment of the present invention, the scenedescription format groups all scene interaction information into fivetables: the object table, the event table, the action table, theresource table and one or more triple tables as described below inTables 12-17. This division into tables eliminates most redundantinformation and enables quick lookup of interaction behavior on low endclients 44.

TABLE 12 ZSD Tables Table Description Object Table This table lists allof the objects in the scene. Objects may be high level entities such asPNodes or just regions on the scene. Event Table This table lists allevents that need processing on this scene. A client may ignore any eventnot listed in this table. Action Table This table lists all actions thatcan be invoked on objects on this scene. Resource This table containsstrings and images. Its main use is to Table decouple the string andimage data from the above tables so that it is trivial for the server toswitch themes and languages. Triple Table This table associates objects,events, and actions. A ZSD encoding may include more than one tripletable and use actions to switch between the active one. This enables thecreation of state machines within a scene.

TABLE 13 Object Table Fields Field Description Object ID A unique ID forthis object. OID number 0 represents the whole scene. Object Type Thetype of the object Description An optional string description to makethe XML clearer Parameters Additional parameters that describe theobject

TABLE 14 Event Table Fields Field Description Event ID A unique ID forthis event Event Type The type of the event Description An optionalstring description to make the XML clearer Parameters Additionalparameters that describe the event

TABLE 15 Action Table Fields Field Description Action ID A unique ID forthis action Action Type The type of the action Next Action The Action IDof the next action to run. Specify the NoAction instance to stopexecuting actions. It is illegal to specify a loop of actions.Description An optional string description to make the XML clearerParameters Additional parameters that describe the action

TABLE 16 Resource Table Fields Field Description Resource ID A unique IDfor this resource Theme ID The theme ID for this resource Language IDThe language ID for this resource Resource Type The type of the resourceDescription An optional string description to make the XML clearerParameters Additional parameters that describe the resource

TABLE 17 Triple Table Fields Field Type Description Object ID Thetriple's object Event ID The event to monitor Action ID The action toinvoke upon receiving the event Boolean True to terminate eventprocessing if this triple matches an event Description An optionalstring description to make the XML clearerVarious additional information regarding an exemplary scene data formataccording to the present invention can be found in theabove-incorporated by reference priority application.

Client devices 44 without local storage request scenes and transitionsfrom the server 42. An exemplary set of messages which can be used toperform this function is provided below in Table 18. The client/serverlink can, for example, be made over an Ethernet connection, QPSKchannels (used by cable networks currently for OOB communications) orany other protocol or type of connection. Those skilled in the art willappreciate that this message set is purely exemplary and that messagescan be added or deleted therefrom.

TABLE 18 Client-Server Messages Message Name ID Source DescriptionRequestScene 0 Client Request the specified scene. RequestSceneAck 1Server Acknowledgment that the server is sending the requested scene.SceneDetails 2 Server The server may send this to the client if it doesnot send scene details in-band with the MPEG-2 scene transitionsDebugControl 3 Server The server sends this message to enable/disabledebug logging and remote control support on the client. LogMessage 4Client Log a text message. The client only sends this message in debugmode. NotifyEvent 5 Client Notify that an event has occurred. The clientonly sends this message in debug mode. NotifyAction 6 Client Notify thatan action has been fired. The client only sends this message in debugmode. NotifyTriple 7 Client Notify that a triple table entry matched.The client only sends this message in debug mode. GenerateEvent 8 ServerGenerate and fire the specified event on the client. These events willbe fired event in lockout mode. The client only accepts this message indebug mode. Lockout 9 Server Lockout/unlock all user-generated events onthe client. Example events include mouse and keyboard events. The clientonly accepts this message in debug mode. Identity 10 Client The clientsends this message every time that it establishes a connection with theserver to identify itself. NotifyServerContext 11 Client The clientsends this message when its server context buffer is not empty and anaction command invokes a server notification or request.RequestScreenCapture 12 Server The server sends this message to requestthat the client take a snapshot of the screen and send it back to theserver in a ScreenCapture message. ScreenCapture 13 Client This is theresponse message to RequentScreenCapture. It contains the snapshot.

Hoverzoom

As mentioned above, one feature of exemplary client-server architecturesand methods according to the present invention is to provide thecapability for sophisticated user interfaces to be generated at theclient-side, while taking into account the relatively small amount ofavailable memory and/or processing power associated with some existingclient devices. One example of the ways in which the above-describedsystems and methods address this issue can be seen with respect to theuser interface interaction referred to herein as a “hoverzoom”, e.g.,the process whereby when a user rolls a cursor over and/or pauses anindicator relative to a media item that can be selected, the imageassociated therewith is magnified so that the user can easily see whichobject is poised for selection, an example of which is illustrated inFIGS. 1( a) and 1(b).

There are a number of challenges associated with implementing ahoverzoom feature in bandwidth limited systems, such as interactivetelevision systems wherein the client devices have limited memory and/orprocessing power. Consider the example wherein the user interface screenillustrated in FIG. 1( a) is rendered using MPEG data streamstransmitted from the user interface server 42 to the client 44containing the cover art images associated with various movies. Thisvisual portion of the user interface screen will be referred to hereinas the background layer. When the event mapper 710 and event processor714 recognize that the user has triggered a hoverzoom response, aforeground layer (e.g., the magnified version of the “Apollo 13 image)is generated and used to modify the user interface screen of FIG. 1( a).There are several possibilities for providing the data used totransition from the user interface screen shown in FIG. 1( a) to theuser interface screen shown in FIG. 1( b). One way to implement thehoverzoom effect is to have the user interface server 42 transmitcomplete sets of MPEG data corresponding to both the background layerand the foreground layer to the client 44. However, when one considersthat the user can roll the cursor over a potentially very large numberof screen objects in the user interface, e.g., dozens or hundreds, quiterapidly, the amount of data needed to be transmitted by the userinterface server 42 could be quite large to implement this exemplaryembodiment of the present invention, resulting in additional delay inrendering the screen transitions on the client device 44.

Moreover, it can be seen from comparing FIG. 1( a) with FIG. 1( b) thata significant portion of the pixel data associated with the unzoomedversion of FIG. 1( a) is reused in creating the hoverzoomed version ofFIG. 1( b). Thus, according to another exemplary embodiment of thepresent invention, the relationship between pixels in the backgroundlayer and the foreground layer can be determined and used to reduce theamount of data that needs to be transmitted to the client device 44 togenerate a hoverzoom effect. Depending upon the object to be magnifiedas part of the hoverzoom effect, this relationship can be relativelysimple or somewhat more complex. For example, enlarging the size of therectangular DVD cover art images of FIG. 1( a) primarily involvesenlarging a rectangular image to occlude neighboring images as part ofthe transition. On the other hand, more complex shapes, e.g., a doughnutshaped object with a hole in the center, present more complex situationsfor generating a hoverzoom effect. Consider that as the doughnut-shapedobject is enlarged, the hole in the middle will expand such thatbackground layer pixels that were previously hidden, become revealedafter the hoverzoom effect has occurred.

According to one exemplary embodiment of the present invention, eachpixel in the foregoround version of the image is categorized as beingone of: (1) completely opaque (can extract pixel color from backgroundlayer, so do not need to resend for foreground layer generation) (2)transparent (irrelevant, so do not need to resend for foreground layer),(3) translucent (e.g., pixels around edges of image can haveanti-aliasing applied thereto, need to send foreground layer data forthese pixels) and (4) null (e.g., doughnut “hole” pixels which revealbackground pixels, need to send background layer pixels since thosecannot necessarily be extracted from background layer that wasoriginally sent to create the unzoomed interface screen). Thiscategorization can be done a priori using any desired technique,including manual observation and/or using the pseudocode processingtechniques described below, and a foreground/background map is generatedwherein each pixel in the foreground layer is categorized. A hoverzoommap can be stored for each image for which a hoverzoom effect can betriggered in the user interface.

To Capture Background

for (node=scenegraph.rootO; node !=foreground node; node=next node) if(node bounds within foreground bounds)

-   paint node to background image

To Capture Foreground

Draw the foreground node to an image with the foreground's original size(low-res foreground)Draw the foreground node to an image with the foreground's maximum size(high-res foreground)After mapping, this data is encoded to reduce the amount of data to besaved and transferred at steps 1010 and 1012 using, for example, thefollowing pseudocode to evaluate the relevance of the background pixelsbased on alpha information.

To Capture Alpha Information

Calculate Foreground Node starting bounds Calculate Foreground Nodeending boundsCreate an alpha image the size of the foreground starting bounds whichonly contains alpha values, initialized to opaqueSet the image's alpha composite rule to keep the minimum value of eitherits current value or the value of the pixel being drawn to itwhile (foreground.size( )<ending bounds) draw foreground to alpha image

-   increase foreground size

To Calculate Which Pixels are Needed for the Background Image

Any pixels in the original background image which are transparent areirrelevantFor all remaining relevant background pixels

-   If (low-res foreground pixel is transparent)    -   Background pixel is irrelevant-   Else if (low-res foreground pixel is opaque and captured alpha pixel    is opaque) Background pixel is irrelevant-   Else    -   Background pixel is relevant        Depending upon the particular image to be encoded in this way,        most of the foreground layer pixels will be designated as opaque        and need not be resent to the client device 44 to generate the        hoverzoom effect.

Hoverzoom processing in accordance with this exemplary embodiment of thepresent invention is generally illustrated in FIG. 10. Therein, an MPEG(background) version of the image 1000 and an unzoomed version 1002 ofthe image to be magnified (for example, Apollo 13 in FIG. 1( a)), e.g.,PNG or JPEG, are provided. The background image 1000 is combined withthe unzoomed version 1002 of the image at step 1004 and transmitted tothe client device 44 in the MPEG data stream, after compression at step1006. The foreground/background map described above is retrieved fromstorage at step 1008, and used to determine which pixel data associatedwith the foreground layer and the background layer needs to betransmitted. That data is encoded (compressed) at steps 1010 and 1012,saved as a ZSD image file at step 1014 and transmitted to the clientdevice 44. Although this exemplary embodiment of the present inventiontransmits this information as scene data (ZSD data) outside of the MPEGdata stream, it can alternatively be embedded in the MPEG data stream.

As will be appreciated by reading the foregoing discussion of hoverzoomtechniques in accordance with an exemplary embodiment of the presentinvention, some of the challenges associated with generatingsophisticated user interfaces (e.g., which employ zooming) at clientdevices connected to, for example, a cable network, can be addressed byintelligent selection of an encoding stream for particular data to betransmitted. In the foregoing hoverzoom example, background data wassent using the MPEG encoding stream available in such networks, whilethe foreground information was sent using a different type of encoding(described above), handled for presentation through the OSD layer.However, exemplary embodiments of the present invention contemplate thatother server/client data transfers may benefit from selectivelydeciding, at one of the upstream nodes which is supplying data to theclient device 44, which type of encoding/data stream is appropriate fordata to be transmitted, in particular for data associated with zoominguser interfaces.

This general concept is illustrated in FIG. 11. Therein, data isevaluated at block 1100 to determine whether it is first data or seconddata and selectively determining a type of encoding (and associatedtransmit data stream) for handling that data. First and second data canbe different types of data or the same type of data having differentcharacteristics. An example of the foregoing is the hoverzoom data(background data being first data and foreground data being seconddata). An example of the latter is text. MPEG encoding is notparticularly efficient for encoding text and, accordingly, it may bedesirable to encode text under certain circumstances using another typeof encoding, e.g., if the text to be transmitted is less than apredetermined font size (e.g., 16 point).

Systems and methods for processing metadata according to exemplaryembodiments of the present invention can be performed by processorsexecuting sequences of instructions contained in a memory device (notshown). Such instructions may be read into the memory device from othercomputer-readable mediums such as secondary data storage device(s).Execution of the sequences of instructions contained in the memorydevice causes the processor to operate, for example, as described above.In alternative embodiments, hard-wire circuitry may be used in place ofor in combination with software instructions to implement the presentinvention.

The above-described exemplary embodiments are intended to beillustrative in all respects, rather than restrictive, of the presentinvention. Thus the present invention is capable of many variations indetailed implementation that can be derived from the descriptioncontained herein by a person skilled in the art. For example, althoughMPEG encoding and MPEG data streams have been described in the foregoingexemplary embodiments, it will be appreciated that different types ofencodings and data streams can be substituted therefore in part or inwhole, e.g., video encodings used in Windows Media-based content and thelike. Moreover, although (MPEG) image and/or video data is described asbeing transmitted through all or part of a cable network, the presentinvention is equally applicable to systems wherein the image and/orvideo data is available locally, e.g., on a home disk or from a localserver. All such variations and modifications are considered to bewithin the scope and spirit of the present invention as defined by thefollowing claims. No element, act, or instruction used in thedescription of the present application should be construed as criticalor essential to the invention unless explicitly described as such. Also,as used herein, the article “a” is intended to include one or moreitems.

1. A method for generating a hoverzoom effect on a user interfacecomprising the steps of: transmitting background layer data andforeground layer data to a client device; displaying a background layerbased on said background layer data; identifying a user actionassociated with said hoverzoom effect; displaying, in response to saiduser action, a foreground layer based on said foreground layer data asan overlay on said background layer.
 2. The method of claim 1, whereinsaid user action is rolling a cursor over a displayed image and saidforeground layer includes a magnified version of said displayed image.3. The method of claim 1, further comprising the step of: categorizingeach pixel in said foreground layer as either (1) being extractable fromsaid background layer data or (2) needing to be sent to said clientdevice.
 4. The method of claim 3, wherein said step of transmittingbackground layer data and foreground layer data further comprises thestep of: selectively transmitting pixel data in said foreground layerbased on said categorization step.
 5. The method of claim 3, furthercomprising the step of: identifying additional background pixels whichwill be revealed as a result of said hoverzoom effect; and transmittingbackground layer data associated with said additional background pixels.6. A client system for generating a hoverzoom effect on a user interfacecomprising: at least one receiver for receiving background layer dataand foreground layer data; a processor for generating a background layerbased on said background layer data, identifying a user actionassociated with said hoverzoom effect and generating, in response tosaid user action, a foreground layer based on said foreground layer dataas an overlay on said background layer.
 7. The system of claim 6,wherein said user action is rolling a cursor over a displayed image andsaid foreground layer includes a magnified version of a displayed image.8. The system of claim 6, wherein said receiver also receives a mapwhich categorizes each pixel in said foreground layer as either (1)being extractable from said background layer data or (2) being sent tosaid client system.
 9. The system of claim 8, wherein said receivedforeground layer data includes only some of the data needed by saidprocessor to generate said foreground layer and said processor extractsadditional data from said background layer data.
 10. A method for MPEGencoding data to be used in generating user interface screens comprisingthe steps of: providing a user interface having first and second userinterface screen and storing data associated with object locations onsaid first and second user interface screens; determining motion vectorsassociated with MPEG data frames using said stored data; sending saidmotion vectors to an MPEG encoder; and MPEG encoding said data usingsaid motion vectors.
 11. The method of claim 10, wherein said step ofdetermining motion vectors associated with MPEG data frames using saidstored data further comprises the step of: using said stored data toeither (1) determine a motion vector independently of a standard MPEGmotion vector search algorithm or (2) selectively employ said standardMPEG motion vector search algorithm to determine said motion vector. 12.The method of claim 10, wherein said step of determining motion vectorsassociated with MPEG data frames using said stored data furthercomprises the step of: using said stored data to either (1) determine amotion vector independently of a standard MPEG motion vector searchalgorithm, (2) selectively employ said standard MPEG motion vectorsearch algorithm to determine said motion vector or (3) reduce a searchrange of said standard MPEG motion vector algorithm.
 13. A system forMPEG encoding data to be used in generating user interface screenscomprising: a user interface having first and second user interfacescreens; a data storage unit for storing data associated with objectlocations on said first and second user interface screens; a motionestimation hint encoder for determining motion vectors associated withMPEG data frames using said stored data; and an MPEG encoder forencoding data using said motion vectors.
 14. The system of claim 13,wherein motion estimation hint encoder uses said stored data to either(1) determine a motion vector independently of a standard MPEG motionvector search algorithm or (2) to selectively command said MPEG encoderto determine said motion vector using said standard MPEG motion vectorsearch algorithm.
 15. The system of claim 13, wherein said motionestimation hint encoder uses said stored data to provide said MPEGencoder with a reduced search range in which to employ said standardMPEG motion vector search algorithm.